yaseen 



For scheduling, related tools etc., see m:Data dumps. 
"WP:DD" redirects here. For Duplication detector, see 
Wikipedia:Duplication detector. 

Wikipedia offers free copies of all available content to in- 
terested users. These databases can be used for mirroring, 
personal use, informal backups, offline use or database 
queries (such as for Wikipedia:Maintenance). All text 
content is multi-licensed under the Creative Commons 
Attribution-Share Alike 3.0 License (CC-BY-SA) and the 
GNU Free Documentation License (GFDL). Images and 
other files are available under different terms, as detailed 
on their description pages. For our advice about comply- 
ing with these licenses, see Wikipedia:Copyrights. 

1 Offline Wikipedia readers 

Some of the many ways to read Wikipedia while offline: 

• XOWA (#XOWA) 

• WikiTaxi #WikiTaxi 

• WikiReader 

• Wikipedia on rockbox #Wikiviewer for Rockbox 

• Wikipedia Featured Articles as a Printed 
Book http://www.brandnew.uk.com/ 
wikipedia- as- a- printed-book/ 

• WikiFilter #WikiFilter 

• Wiki as E-Book #E-book 

• Selected Wikipedia articles as a PDF, OpenDocu- 
ment, etc. Wikipedia:Books 

• Selected Wikipedia articles as a printed book Help: 
Books/Printed books 

• Okawix http://sourceforge.net/p/okawix/code/ 
HEAD/tree/ 

• ofmne-wikipedia #Ofmne wikipedia reader 

• offline-wiki http://offline-wiki.googlecode.com/ 
git/app.html 

• Kiwix 

• iPodLinux 



• BzReader #BzReader and MzReader (for Windows) 

• aarddict #Aard Dictionary 

Some of them are mobile applications — see "list of 
Wikipedia mobile applications". (Listed in reverse al- 
phabetical order, for no particular reason). 

2 Where do I get... 

2.1 English-language Wikipedia 

• Dumps from any Wikimedia Foundation project: 
dumps . wikimedia . org 

• English Wikipedia dumps in SQL and XML: 
dumps . wikimedia . org/ enwiki/ 

• Download the data dump using a BitTorrent 
client (torrenting has many benefits and re- 
duces server load, saving bandwidth costs). 

• pages-articles. xml.bz2 - Current revisions 
only, no talk or user pages; this is probably 
what you want, and is approximately 10 GB 
compressed (expands to over 40 GB when un- 
compressed). 

• pages-meta-current.xml.bz2 - Current revi- 
sions only, all pages (including talk) 

• abstract.xml.gz - page abstracts 

• all-titles-in-nsO.gz - Article titles only (with 
redirects) 

• SQL files for the pages and links are also avail- 
able 

• All revisions, all pages: These files expand 
to multiple terabytes of text. Please only 
download these if you know you can cope 
with this quantity of data. Go to Latest 
Dumps and look out for all the files that have 
'pages-meta-history' in their name. 

• To download a subset of the database in XML for- 
mat, such as a specific category or a list of articles 
see: SpeciakExport, usage of which is described at 
Help:Export. 

• Wiki front-end software: Media Wiki . 

• Database backend software: You want to download 
MySQL. 

• Image dumps: See below. 
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2.2 Other languages 

In the dumps.wikimedia.org directory you will find the 
latest SQL and XML dumps for the projects, not just En- 
glish. For example, (others exist, just select the appropri- 
ate language code and the appropriate project): 

• Catalan Wikipedia dumps: 
dumps.wikimedia.org/cawiki/ 

• Chinese Wikipedia dumps: 
dumps.wikimedia.org/zhwiki/ 

• English Wikipedia dumps: 
dumps.wikimedia.org/enwiki/ 

• French Wikipedia dumps: 
dumps.wikimedia.org/frwiki/ 

• German Wikipedia dumps: 
dumps.wikimedia.org/dewiki/ 

• Italian Wikipedia dumps: 
dumps.wikimedia.org/itwiki/ 

• Japanese Wikipedia dumps: 
dumps.wikimedia.org/jawiki/ 

• Polish Wikipedia dumps: 
dumps.wikimedia.org/plwiki/ 

• Portuguese Wikipedia dumps: 
dumps.wikimedia.org/ptwiki/ 

• Russian Wikipedia dumps: 
dumps.wikimedia.org/ruwiki/ 

• Spanish Wikipedia dumps: 
dumps.wikimedia.org/eswiki/ 

• Kimchian Wikipedia dumps: 
dumps.wikimedia.org/kowiki/ 

Some other directories (e.g. simple, nostalgia) exist, with 
the same structure. 

3 Where are the uploaded files (im- 
age, audio, video, etc., files)? 

Images and other uploaded media are available from mir- 
rors in addition to being served directly from Wikime- 
dia servers. Bulk download is currently (as of Septem- 
ber 2013) available from mirrors but not offered directly 
from Wikimedia servers. See the list of current mirrors. 
You should rsync from the mirror, then fill in the missing 
images from upload.wikimedia.org; when downloading 
from upload.wikimedia.org you should throttle yourself 
to 1 cache miss per second (you can check headers on a 
response to see if was a hit or miss and then back off when 
you get a miss) and you shouldn't use more than one or 



two simultaneous HTTP connections. In any case, make 
sure you have an accurate user agent string with contact 
info (email address) so ops can contact you if there's an 
issue. You should be getting checksums from the medi- 
awiki API and verifying them. The API Etiquette page 
contains some guidelines, although not all of them apply 
(for example, because upload.wikimedia.org isn't Medi- 
aWiki, there is no maxlag parameter). 

Unlike most article text, images are not necessarily li- 
censed under the GFDL & CC-BY-SA-3.0. They may 
be under one of many free licenses, in the public do- 
main, believed to be fair use, or even copyright infringe- 
ments (which should be deleted). In particular, use of 
fair use images outside the context of Wikipedia or sim- 
ilar works may be illegal. Images under most licenses 
require a credit, and possibly other attached copyright 
information. This information is included in image de- 
scription pages, which are part of the text dumps avail- 
able from dumps.wikimedia.org. In conclusion, down- 
load these images at your own risk (Legal) 

4 Dealing with compressed files 

Compressed dump files are significantly compressed, thus 
after uncompressed will take up large amounts of drive 
space. The following are programs that can be used to 
uncompress bzip2 (.bz2) and .7z files. 

Windows 

Windows does not ship with a bzip2 decompressor pro- 
gram. The following can be used to decompress bzip2 
files. 

• bzip2 (command-line) (from here) is available for 
free under a BSD license. 

• 7-Zip is available for free under an LGPL license. 

• WinRAR 

• WinZip 

Mac 

• OS X ships with the command-line bzip2 tool. 
GNU/Linux 

• GNU/Linux ships with the command-line bzip2 
tool. 

BSD 

• Some BSD systems ship with the command-line 
bzip2 tool as part of the operating system. Others, 
such as OpenBSD, provide it as a package which 
must first be installed. 



5.2 Operating system limits 
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Notes 

1 . Some older versions of bzip2 may not be able to han- 
dle files larger than 2 GB, so make sure you have the 
latest version if you experience any problems. 

2. Some older archives are compressed with gzip, 
which is compatible with PKZIP (the most common 
Windows format). 



5 Dealing with large files 



• exFAT supports files up to 127 PB. exFAT is the fac- 
tory format of all SDXC cards, but is incompatible 
with most flavors of UNIX due to licensing prob- 
lems. 

• NTFS supports files up to 16 TB. NTFS is the de- 
fault file system for Windows computers, including 
Windows 2000, Windows XP, and all their succes- 
sors to date. 

• ReFS supports files up to 16 EB. 
Mac 



As files grow in size, so does the likelihood they will ex- 
ceed some limitation of a computing device. Each oper- 
ating system, file system, hard storage device, and soft- 
ware (application) has a maximum file size limit. Each 
one of these will likely have a different maximum file size 
limit, but the lowest limit of all of them will become the 
file size limit for a storage device. 

The older the software in a computing device, the more 
likely it will have a 2 GB file limit somewhere in the sys- 
tem. This is due to older software using 32-bit integers 
for file indexing, which limits file sizes to 2 A 3 1 bytes (2 
GB) (for signed integers), or 2 A 32 (4 GB) (for unsigned 
integers). Older C programming libraries have this 2 or 4 
GB limitation, but the newer file libraries have been con- 
verted to 64-bit integers thus supporting file sizes up to 
2 A 63 or 2 A 64 bytes (8 or 16 EB). 

Before starting a download of a large file, check the stor- 
age device to ensure its file system can support files of 
such a large size, and check the amount of free space to 
ensure that it can hold the downloaded file. 

5.1 File system limits 



• HFS+ supports files up to 8 EB on Mac OS X 10.2+ 
and iOS. HFS+ is the default file system for OS X 
computers. 

Linux 

• ext2 and ext3 supports files up to 16 GB, but up to 
2 TB with larger block sizes. See http://www.suse. 
com/~{ }aj/linux_lfs.html for more information. 

• ext4 supports files up to 16 TB (using 4 KB block 
size), (limitation removed in e2fsprogs- 1.42 (2012)) 

• XFS supports files up to 8 EB. 

• ReiserFS supports files up to 1 EB (8 TB on 32-bit 
systems). 

• JFS supports files up to 4 PB. 

• Btrfs supports files up to 16 EB 

• NILFS supports files up to 8 EB 

• YAFFS2 supports files up to 2 GB. 



There are two limits for a file system: the file system size 
limit, and the file size limit. In general, since the file size 
limit is less than the file system limit, the larger file sys- 
tem limits are a moot point. A large percentage of users 
assume they can create files up to the size of their storage 
device, but are wrong in their assumption. For example, a 
16 GB storage device formatted as FAT32 file system has 
a file limit of 4 GB for any single file. The following is a 
list of the most common file systems, and see Comparison 
of file systems for additional detailed information. 

Windows 



FreeBSD 

• ZFS supports files up to 16 EB. 

5.2 Operating system limits 

Each operating system has internal file system limits for 
file size and drive size, which is independent of the file 
system or physical media. If the operating system has any 
limits lower than the file system or physical media, then 
the O/S limits will be the real limit. 



Windows 



• FAT 16 supports files up to 4 GB. FAT 16 is the fac- 
tory format of smaller USB drives and all SD cards 
that are 2 GB or smaller. 

• FAT32 supports files up to 4 GB. FAT32 is the fac- 
tory format of larger USB drives and all SDHC cards 
that are 4 GB or larger. 



• For Windows 95/98/ME, there is a 4 GB limit for 
all file sizes. 

• For Windows XP, there is a 16 EB limit for all file 
sizes. 
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• For Windows 7, there is a 16 TB limit for all file 
sizes. 

• For Windows 8/Server 2012, there is a 256 TB limit 
for all file sizes. 

Linux 

• For 32-bit Kernel 2.4.x systems, there is a 2 TB limit 
for all file systems. 

• For 64-bit Kernel 2.4.x systems, there is an 8 EB 
limit for all file systems. 

• For 32-bit Kernel 2.6.x systems without option 
CONFIG_LBD, there is a 2 TB limit for all file sys- 
tems. 

• For 32-bit Kernel 2.6.x systems with option CON- 
FIG_LBD and all 64-bit Kernel 2.6.x systems, there 
is an 8 ZB limit for all file systems.' 11 

Google Android 

Google Android is based upon Linux, which determines 
its base limits. 

• Internal Storage: 

• For Android 2.3 and later, uses the ext4 file 
system. 12 ' 

• For Android 2.2 and earlier, uses the YAFFS2 
file system. 

• External Storage Slots: 

• All Android devices should support FAT 16, 
FAT32, ext2 file systems. 

• Android 2.3 and later supports ext4 file system. 
Apple iOS (see List of iOS devices) 

• All devices support HFS+ for internal storage. No 
devices have external storage slots. 

5.3 Tips 

5.3.1 Detect corrupted files 

It is a good idea to check the MD5 sums (provided in a 
file in the download directory) to make sure your down- 
load was complete and accurate. You can check this by 
running the "md5sum" command on the files you down- 
loaded. Given how large the files are, this may take some 
time to calculate. Due to the technical details of how files 
are stored, file sizes may be reported differently on differ- 
ent filesystems, and so are not necessarily reliable. Also, 
you may have experienced corruption during the down- 
load, though this is unlikely. 



5.3.2 Reformatting external USB drives 

If you plan to download Wikipedia Dump files to one 
computer and use an external USB Flash Drive or Hard 
Drive to copy them to other computers, then you will 
run into the 4 GB FAT32 file size limitation issue. To 
work around this issue, reformat the >4 GB USB Drive 
to a file system that supports larger file sizes. If you are 
working exclusively with Windows XP/Vista/7 comput- 
ers, then reformat your USB Drive to NTFS file system. 

5.3.3 Linux and Unix 

If you seem to be hitting the 2 GB limit, try using wget 
version 1 . 1 0 or greater, cURL version 7 . 1 1 . 1 - 1 or greater, 
or a recent version of lynx (using -dump). Also, you can 
resume downloads (for example wget -c). 

6 Why not just retrieve data from 
wikipedia.org at runtime? 

Suppose you are building a piece of software that 
at certain points displays information that came from 
Wikipedia. If you want your program to display the in- 
formation in a different way than can be seen in the live 
version, you'll probably need the wikicode that is used to 
enter it, instead of the finished HTML. 

Also if you want to get all of the data, you'll probably want 
to transfer it in the most efficient way that's possible. The 
wikipedia.org servers need to do quite a bit of work to 
convert the wikicode into HTML. That's time consuming 
both for you and for the wikipedia.org servers, so simply 
spidering all pages is not the way to go. 

To access any article in XML, one at a time, access 
Special:Export/Title of the article. 

Read more about this at SpeciakExport. 

Please be aware that live mirrors of Wikipedia that are 
dynamically loaded from the Wikimedia servers are pro- 
hibited. Please see Wikipedia:Mirrors and forks. 

6.1 Please do not use a web crawler 

Please do not use a web crawler to download large num- 
bers of articles. Aggressive crawling of the server can 
cause a dramatic slow-down of Wikipedia. 

6.1.1 Sample blocked crawler email 

IP address nnn.nnn.nnn.nnn was retrieving up 
to 50 pages per second from wikipedia.org 
addresses. Robots.txt has a rate limit of 
one per second set using the Crawl-delay 
setting. Please respect that setting. If you 



7.2 XML schema 
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must exceed it a little, do so only during the The sql file used to initialize a MediaWiki database can 

least busy times shown in our site load graphs at be found here. 
stats.wikimedia.org/EN/ChartsWikipediaZZ.htm. 
It's worth noting that to crawl the whole site 

at one hit per second will take several weeks. 7.2 XML schema 

The originating IP is now blocked or will 

be shortly. Please contact us if you want it The XML schema for each dump is defined at the top of 

unblocked. Please don't try to circumvent it - the file. And also described in the MediaWiki export help 

we'll just block your whole IP range. page. 



If you want information on how to get our 
content more efficiently, we offer a variety 
of methods, including weekly database dumps 
which you can load into MySQL and crawl lo- 
cally at any rate you find convenient. Tools are 
also available which will do that for you as of- 
ten as you like once you have the infrastructure 
in place. 

Instead of an email reply you may prefer to visit 
#mediawiki coimect at irc.freenode.net to discuss 
your options with our team. 

Note that the robots.txt currently has a commented out 
Crawl-delay: 

## *at least* 1 second please, preferably more :D ## 
we're disabling this experimentally 1 1-09-2006 #Crawl- 
delay: 1 

Please be sure to use an intelligent non-zero delay regard- 
less. 

6.2 Doing Hadoop MapReduce on the 
Wikipedia current database dump 

You can do Hadoop MapReduce queries on the current 
database dump, but you will need an extension to the In- 
putRecordFormat to have each <page> </page> be a sin- 
gle mapper input. A working set of java methods (job- 
Control, mapper, reducer, and XmllnputRecordFormat) 
is available at Hadoop on the Wikipedia 

6.3 Doing SQL queries on the current 
database dump 

You can do SQL queries on the current database dump 
(as a replacement for the disabled SpecialAsksql page). 



7 Database schema 
7.1 SQL schema 

See also: mw:Manual:Database layout 



8 Help parsing dumps for use in 
scripts 

• Wikipedia:Computer help 
desk/ParseMediaWikiDump describes the Perl 
Parse: Media WikiDump library, which can parse 
XML dumps. 

• Wikipedia preprocessor (wikiprep.pl) is a Perl script 
that preprocesses raw XML dumps and builds link 
tables, category hierarchies, collects anchor text for 
each article etc. 

• Wikipedia SQL dump parser is a .NET library 
to read MySQL dumps without the need to use 
MySQL database 

• Dictionary Builder is a Java program that can parse 
XML dumps and extract entries in files 

9 Help importing dumps into 
MySQL 

See: 

• mw:Manual:Importing XML dumps 

• m:Data_dumps 

10 Static HTML tree dumps for 
mirroring or CD distribution 

MediaWiki 1.5 includes routines to dump a wiki to 
HTML, rendering the HTML with the same parser used 
on a live wiki. As the following page states, putting one 
of these dumps on the web unmodified will constitute a 
trademark violation. They are intended for private view- 
ing in an intranet or desktop installation. 

• If you want to draft a traditional website in Medi- 
awiki and dump it to HTML format, you might want 
to try mw2html by UsenConnelly. 
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• If you'd like to help develop dump-to-static HTML 
tools, please drop us a note on the developers' mail- 
ing list. 

• Static HTML dumps are now available here, but are 
not current. 

See also: 

• mw: Alternative parsers lists some other not working 
options for getting static HTML dumps 

• Wikipedia:Snapshots 

• Wikipedia:TomeRaider database 

• http://sdict.com hosts a January 2007 snapshot in 
the open source Sdictionary .dct format 

• http://ahuv.net/wikipedia hosts October 2010 pro- 
cessed snapshot in the freeware MDict .mdx format 

10.1 Kiwix 

Kiwix - last update of English Wikipedia was 
July 2014 ( http://download.kiwix.org/zim/ 
wikipedia/wikipedia_en_all_07_2014.zim 
use http://download.kiwix.Org/bin/0.9/ to 
work perfectly ) 



10.2 Aard Dictionary 

[Aard Dictionary] is an Offline Wikipedia reader. No 
images. Cross-Platform for Windows, Mac, Linux, An- 
droid, Maemo. Runs on rooted Nook and Sony PRS-T1 
eBooks readers, https://github.com/aarddict 



10.3 E-book 

The wiki-as-ebook store provides ebooks created from a 
large set of Wikipedia articles with grayscale images for 
e-book-readers (2013). 

10.4 Wikiviewer for Rockbox 

The wikiviewer plugin for rockbox permits viewing con- 
verted Wikipedia dumps on many Rockbox devices. It 
needs a custom build and conversion of the wiki dumps 
using the instructions available at http://www. rockbox. 
org/tracker/4755 .The conversion recompresses the file 
and splits it into 1 GB files and an index file which all 
need to be in the same folder on the device or micro sd 
card. 



10.5 Old dumps 

• The static version of Wikipedia created by Wiki- 
media: http://static.wikipedia.org/ Feb. 11, 2013 - 
This is apparently offline now. There was no con- 
tent. 

• Wiki2static (site down as of October 2005) was 
an experimental program set up by UserAlfio to 
generate html dumps, inclusive of images, search 
function and alphabetical index. At the linked site 
experimental dumps and the script itself can be 
downloaded. As an example it was used to gener- 
ate these copies of English WikiPedia 24 April 04, 
Simple WikiPedia 1 May 04(old database) format 
and English WikiPedia 24 July 04Simple WikiPedia 
24 July 04, WikiPedia Francais 27 JuiUet 2004 (new 
format). BozMo uses a version to generate periodic 
static copies at fixed reference. 

11 Dynamic HTML generation 
from a local XML database 
dump 

Instead of converting a database dump file to many pieces 
of static HTML, one can also use a dynamic HTML gen- 
erator. Browsing a wiki page is just like browsing a Wiki 
site, but the content is fetched and converted from a local 
dump file upon request from the browser. 

11.1 XOWA 

XOWA is a free, open-source application that lets you 
download Wikipedia to your computer. Access all of 
Wikipedia offline — without an internet connection! It 
is currently in the beta stage of development, but is func- 
tional. It is available for download here. 

11.1.1 Features 

• Displays all articles from Wikipedia without an in- 
ternet connection. 

• Download a complete, recent copy of English 
Wikipedia. 

• Display 4.4+ million articles in full HTML format- 
ting. 

• Show images within an article. Access 3.7+ million 
images using the offline image databases. 

• Works with any Wikimedia wiki, including 
Wikipedia, Wiktionary, Wikisource, Wikiquote, 
Wikivoyage (also some non-wmf dumps) 



2 Offline wikipedia reader 
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Works with any non-English language wiki such 
as French Wikipedia, German Wikisource, Dutch 
Wikivoyage, etc. 

Works with other specialized wikis such as Wiki- 
data, Wikimedia Commons, Wikispecies, or any 
other Media Wiki generated dump 

Set up over 660+ other wikis including: 

• English Wiktionary 

• English Wikisource 

• English Wikiquote 

• English Wikivoyage 

• Non-English wikis, such as French Wik- 
tionary, German Wikisource, Dutch Wikivoy- 
age 

• Wikidata 

• Wikimedia Commons 

• Wikispecies 

• ... and many more! 

Update your wiki whenever you want, using Wiki- 
media's database backups. 

Navigate between offline wikis. Click on "Look up 
this word in Wiktionary" and instantly view the page 
in Wiktionary. 

Edit articles to remove vandalism or errors. 

Install to a flash memory card for portability to other 
machines. 

Run on Windows, Linux and Mac OS X. 

View the HTML for any wiki page. 

Search for any page by title using a Wikipedia-like 
Search box. 

Browse pages by alphabetical order using Special: 
AllPages. 

Find a word on a page. 

Access a history of viewed pages. 

Bookmark your favorite pages. 

Downloads images and other files on demand (when 
connected to the internet) 

Sets up Simple Wikipedia in less than 5 minutes 

Can be customized at many levels: from keyboard 
shortcuts to HTML layouts to internal options 



11.2 Offline wikipedia reader 

(for Mac OS X, GNU/Linux, 

FreeBSD/OpenBSD/NetBSD, and other Unices) 

The offline-wikipedia project provides a very effective 
way to get an offline version of Wikipedia. It uses en- 
tirely free software. Packages are available for Ubuntu 
and soon for other Linux distributions. 

11.2.1 Main features 

1 . Very fast searching 

2. Keyword (actually, title words) based searching 

3. Search produces multiple possible articles: you can 
choose amongst them 

4. LaTeX based rendering for mathematical formulae 

5. Minimal space requirements: the original .bz2 file 
plus the index 

6. Very fast installation (a matter of hours) compared 
to loading the dump into MySQL 

11.3 WikiFilter 

WikiFilter is a program which allows you to browse over 
100 dump files without visiting a Wiki site. 

11.3.1 WikiFilter system requirements 

• A recent Windows version (WinXP is fine; Win98 
and WinME won't work because they don't have 
NTFS support) 

• A fair bit of hard drive space (To install you will 
need about 12-15 Gigabytes; afterwards you will 
only need about 10 Gigabytes) 

11.3.2 How to set up WikiFilter 

1 . Start downloading a Wikipedia database dump file 
such as an English Wikipedia dump. It is best to use 
a download manager such as GetRight so you can 
resume downloading the file even if your computer 
crashes or is shut down during the download. 

2. Download XAMPPLITE from (you must get the 
1.5.0 version for it to work). Make sure to pick the 
file whose filename ends with .exe 

3. Install/extract it to C:\XAMPPLITE. 

4. Download WikiFilter 2.3 from this site: http:// 
sourceforge.net/projects/wikifilter. You will have 
a choice of files to download, so make sure that you 
pick the 2.3 version. Extract it to C:\WIKIFILTER. 
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5. Copy the WikiFilter.so into your 
C:\XAMPPLITE\apache\modulesfolder. 

6. Edit your C:\xampplite\apache\conf\httpd.conf file, 
and add the following line: 

• LoadModule WikiFilter_module 
"C:/XAMPPLITE/apache/modules/WikiFilter. 

7. When your Wikipedia file has finished downloading, 
uncompress it into your C:\WIKIFILTER folder. (I 
used WinRAR http://www.rarlab.com/ demo ver- 
sion - BitZipper http://www.bitzipper.com/winrar. 
html works well too.) 

8. Run WikiFilter (Wikilndex.exe), and go to your 
C:\WIKIFILTER folder, and drag and drop the 
XML file into the window, click Load, then Start. 

9. After it finishes, exit the window, and go to your 
C:\XAMPPLITE folder. Run the setup_xampp.bat 
file to configure xampp. 

10. When you finish with that, run the Xampp- 
Control.exe file, and start Apache. 

11. Browse to http://localhost/wiki and see if it works 

• If it doesn't work, see the forums. 



11.4 WikiTaxi 

WikiTaxi is an offline-reader for wikis in MediaWiki for- 
mat. It enables users to search and browse popular wikis 
like Wikipedia, Wikiquote, or WikiNews, without being 
connected to the Internet. WikiTaxi works well with dif- 
ferent languages like English, German, Turkish, and oth- 
ers but has a problem with right- to-left language scripts. 
Doesn't allow to display images though. 

11.4.1 WikiTaxi system requirements 

• Any Windows version starting from Windows 95 or 
later. Large File support (greater than 4 GB) for the 
huge wikis (English only at the time of this writing). 

• It also works on Linux with Wine. 

• 16 MB RAM minimum for the WikiTaxi reader, 
128 MB recommended for the importer (more for 
speed). 

• Storage space for the WikiTaxi database. This re- 
quires about 11.7 GiB for the English Wikipedia (as 
of 5 April 2011), 2 GB for German, less for other 
Wikis. These figures are likely to grow in the future. 



11.4.2 WikiTaxi usage 

1 . Download WikiTaxi and extract to an empty folder. 
No installation is otherwise required. 

2. Download the XML database dump (*.xml.bz2) of 
your favorite wiki. 

so 3. Run WikiTaxi_Importer.exe to import the database 
dump into a WikiTaxi database. The importer takes 
care to uncompress the dump as it imports, so make 
sure to save your drive space and do not uncompress 
beforehand. 

4. When the import is finished, start up WikiTaxi.exe 
and open the generated database file. You can start 
searching, browsing, and reading immediately. 

5. After a successful import, the XML dump file is no 
longer needed and can be deleted to reclaim disk 
space. 

6. To update an offline Wiki for WikiTaxi, download 
and import a more recent database dump. 

For WikiTaxi reading, only two files are required: Wiki- 
Taxi.exe and the .taxi database. Copy them to any storage 
device (memory stick or memory card) or burn them to a 
CD or DVD and take your Wikipedia with you wherever 
you go! 

11.5 BzReader and MzReader (for Win- 
dows) 

BzReader is an offline Wikipedia reader with fast search 
capabilities. It renders the Wiki text into HTML and 
doesn't need to decompress the database. Requires Mi- 
crosoft .NET framework 2.0. 

MzReader by Mun206 works with (though is not affili- 
ated with) BzReader, and allows further rendering of wi- 
kicode into better HTML, including an interpretation of 
the monobook skin. It aims to make pages more readable. 
Requires Microsoft Visual Basic 6.0 Runtime, which is 
not supplied with the download. Also requires Inet Con- 
trol and Internet Controls (Internet Explorer 6 ActiveX), 
which are packaged with the download. 

11.6 EPWING 

Offline Wikipedia database in EPWING dictionary for- 
mat, which is common and an out-dated JlS-standard in 
Japan, can be read including thumbnail images and tables 
with some rendering limitations, on any systems where a 
reader is available (Boookends). There are many free and 
commercial readers for Windows/Mobile, MacOSX/iOS 
(Mac, iPhone, iPad), Android, Unix/Linux/BSD, DOS, 
and Java-based browser applications (EPWING View- 
ers). 
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12 Mirror Building 
12.1 WP-MIRROR 

WP-MIRROR is a free utility for mirroring any desired 
set of WMF wikis. That is, it builds a wiki farm that the 
user can browse locally. WP-MIRROR builds a complete 
mirror with original size media files. WP-MIRROR is 
available for download. 

13 See also 

• DBpedia 

• WikiReader 

• m:Export 

• m:Help:Downloading pages 

• m:Import 

• Meta:Data dumps#Other tools, for related tools, e.g. 
extractors and "dump readers" 

• Wikipedia: Wikipedia- CD/Download 

• Wikipedia:Size of Wikipedia 

• meta:Mirroring Wikimedia project XML dumps 

• metastatic version tools 

14 References 

[1] Large File Support in Linux 

[2] Android 2.2 and before used YAFFS file system; Decem- 
ber 14, 2010. 

15 External links 

• Wikimedia Downloads. 

• Domas visits logs (read this!). Also, old data in the 
Internet Archive. 

• Wikimedia mailing lists archives. 

• UsenEmijrp/Wikipedia Archive. An effort to find 
all the Wiki[mp]edia available data, and to encour- 
age people to download it and save it around the 
globe. 

• Script to download all Wikipedia 7z dumps. 



16 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES 



Text and image sources, contributors, and licenses 
1 Text 

Wikipedia:Database download Source: http://en.wikipedia.org/wiki/Wikipedia%3ADatabase%20download?oldid=654680638 Con- 
tributors: AxelBoldt, WojPob, Lee Daniel Crocker, Brion VIBBER, Eloquence, Mav, Bryan Derksen, The Anome, Stephen Gilbert, 
Ap, Jrincayc, Christian List, Shaihulud, Youandme, Roybadami, Vkem, Elian, Rbrwr, Nealmcb, Patrick, Alfio, Pagingmrherman, 5ko, 
Angela, Kingturtle, Setu, LittleDan, Grin, Samuel, Schneelocke, Kat, Guaka, Timwi, Paul Stansifer, Greenrd, Grendelkhan, RayKiddy, 
Tero, SEWilco, Omegatron, Topbanana, Jamesday, MrJones, Vyasa, R3m0t, Rfcl394, TimR, Hendry, Hadal, JesseW, Diberri, Cyrius, 
Superm401, Pengo, Tobias Bergemann, David Gerard, Connelly, Wizzy, DavidCary, JEvm Arnfjorr) Bjarmason, Msm, Alterego, Filceo- 
laire, Maroux, Avsa, Slyguy, Mboverload, Thrasher6670, Bobblewik, Kubina, Wmahan, Gadfium, Antandrus, BozMo, Beland, Rdsmith4, 
DragonfiySixtyseven, Secfan, Euphoria, Kelson, Chmod007, Grunt, Mike Rosoft, Brianjd, Rich Farmbrough, Pingular, Alistairl978, Ti- 
nus, Demitsu, Night Gyr, Richai'd Taylor, Hapsiainen, Elwikipedista, Zenohockey, Nickj, JRM, Bobol92, Ygfperson, Jkh.gr, Uvarov, 
Scentoni, Brainy J, Ral315, QuantumEleven, Gary, CyberSkull, Andrewpmk, SineSwiper, Water Bottle, Mbloore, Gboudreau, Tony Sid- 
away, RJFJR, Karderio, Feezo, Nuno Tavares, OwenX, Mindmatrix, Commander Keane, Cscott, Triddle, Karam.Anthony.K, Mr Anthem, 
Marudubshinki, Hideyuki, Graham87, Rpwoodbu, David Levy, Ej, Rjwilmsi, CraSH, Salix alba, Nneonneo, S Schaffter, Brighterorange, 
Yug, DiamondDave, Peter. r. bailey, Stephantom, DVdm, Peterl, Roboto de Ajvol, Toes, Armistej, The Storm Surfer, Splash, Cambridge- 
BayWeather, Gustavb, Nbrouard, Robchurch, JPMcGrath, EEMIV, BOT-Superzerocool, Emijip, Open2universe, Cloos, Y23, Alasdair, 
John Broughton, Just2fatty, David Wahler, Blinklmc, Wanchun, Reedy, Unforgettableid, Emj, MasterofUnvrs314, Jmax-, Gabr, AlexJP, 
Kirils, DMacks, Jeremyb, Paulpro, Acdx, ArielGlenn, QwertyO, Platonides, Microchip()8, Mauro Bieg, IvanLanin, AbsolutDan, Phauly, 
WeggeBot, Alexey Petrov, MC10, Meno25, Richardguk, Xtv, RocketOOO, Lx45803, Jacnoc, Jed, Cool Blue, Betal6, Magioladitis, Virtlink, 
Gwem, Sir Intellegence, Ztobor, NAHID, Mirek2, Jim.henderson, Mausy5043, Jane Q. Public, Tgeairn, McSly, Mikael Haggstrom, 
JamesMayfield, Robertgreer, KylieTastic, Funandtrvl, VolkovBot, Flyingtoasterl337, Muro de Aguas, Qxz, Seb26, Strategist333, Ah- 
ban, Blurpeace, Hilarionymous, Gerakibot, Android Mouse, jlfjjCjOS, Lightmouse, Svick, Thegeorgewashington, Invertzoo, Netopyr-e, 
SfanOO IMG, Tanvir Ahmmed, Insomniac, Nnemo, Wwefan981, Wvarner, Zaharous, Lartoven, Sun Creator, Micronie, LobStoR, Per- 
chy22, Stickee, LatitudeBot, Caseybales, Scientus, WikiTaxi, Dayewalker, OlEnglish, Legobot, Donthedev, Kirov Airship, Obersach- 
sebot, Antime, Reality006, Flokarti, Ryankearney, Jonesey95, Hellknowz, Yellow TuRbAnZ, MastiBot, Dudel818, Babilen, Jonkerz, 
Lotje, Dc987, Merlininthewoods, Zzusse, Visite fortuitement prolongee, Jesse V., I<_jjta, DavideSetti, Norlesh, Markiij, NeeDonation, 
Deagle AP, John of Reading, MirekDve, Namnguyenvn, SporkBot, Tolly4bolly, Sbmeirow, Scientific29, Rich Smith, Robin Mathew Ra- 
jan, RaptorHunter, Charliegreenl, VolodymyrB, Ngoisaotinhyeu valleylove, Teleshoes, Chmarkine, Vestai'3000, Mohsen haqiqat 1955, 
Serverbuffer, FrostyCee, Ikesham, Julia Minguillon, Ragul karan singh, DavidPKendal, Yamaha5, JohannesBuchner, Lethosor, Rybec, 
Wikiuserl3, BovineJoni, Alexey SVN, Meteor sandwich yum, KpyineBJbaHHH HBaH, CyberXRef, TPMoyerOOO, Wp mirror, Reticulated 
Spline and Anonymous: 302 
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Creative Commons Attribution-Share Alike 3.0 



